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Abstract 

Recurrence formulas are presented for studying the accuracy of the Fitch method for reconstructing 
the ancestral states in a given phylogenetic tree. As their applications, we analyze the convergence of the 
accuracy of reconstructing the root state in a complete binary tree of 2" as n goes to infinity and also give 
a lower bound on the accuracy of reconstructing the root state in an ultrametric tree. 

Keywords Ancestral state reconstruction, analysis of reconstruction accuracy, Fitch method, phylogenetic 
trees. 

1 Introduction 

Ancestral sequence reconstruction incorporates sequences from modern living things into evolutionary models 
to estimate the corresponding sequence of an ancestor that died millions of years ago. This approach to 
understanding proteins was first suggested by Zukerkandl and Pauling in their seminal work [7j in 1963. With 
the rapid accumulation of biomolecular sequence data and advances in computational biology, it has become 
an important approach to studying the origin and evolution of genes, proteins and even whole genomes (see 
for example [5] and [10] ) 

The Fitch method 31 the first phylogenetic technique used for inferring the ancestral states of a 
character when the phylogeny that relates the ancestor to the extant species is known [I]. As a parsimony 
method, it estimates the ancestral state by minimizing the total number of hypothetical substitutions in all 
branches that are used to explain the evolution of the character states. It is efficient and accurate for sequences 
that are reasonably similar to each other. However, the accuracy of the Fitch method for reconstructing 
ancestral states has yet to be well studied [H [HI [H] ■ 

In this work, we present a set of recurrence formulas for analyzing the reconstruction accuracy of the 
Fitch method (in Theorem 13. 1|) . These formulas are derived from a work of Maddison [6] (also see [9]). 
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They are simple and useful as demonstrated in solving two theoretical problems that arise from studying the 
reconstruction accuracy of the Fitch method. 

The first problem is to analyze the convergence of the accuracy of the Fitch method for reconstructing the 
root state in a complete phylogenetic tree in the equal-length branch and two-state Jukes-Cantor model (see 
Section 2 for details). Let p denote the conservation rate in each branch. In [S], Steel showed that, when the 
Fitch method is applied, the accuracy of reconstructing the root state from all leave states in the complete 
binary tree of 2" leaves converges as n goes to infinity to i if i < p < | and ^ -I- ^ ^^^^2p'-^i)^ if | < P < 1- 
This result was proved under the assumption that suitable limits exists. However, the existence of these limits 
is not trivial. In this paper, we fill the gap left in [9j by proving that these limits exist. In addition, we also 
show that the reconstruction accuracy diverges when p < ^■ 

Complete phylogenetic trees in which all branches have equal length are special ultrametric trees. In an 
ultrametric tree, each branch has its own branch length l{e), with conservation rate p{e) = 5(1 + e^''^'^-') in 
the two-state Jukes-Cantor model, but requiring that the sum of branch lengths is constant in each path from 
the root to a leaf. A counterintuitive fact is that the reconstruction accuracy of the Fitch method is not a 
monotonic function of the size of taxa selected for reconstruction of the root state (even for ultrametric trees) 
[1]. Hence, Li et al asked whether the accuracy RAp of the Fitch method for reconstructing the root state 
from all leaf states is always larger than or equal to the conservation rate along a root-to-leaf path or not in 
an ultrametric tree. Recently, this problem is positively answered by Fischer and Thatte In the second 
part of this paper, we present a stronger lower bound on RAp for arbitrary ultrametric trees. Our bound 
implies that KAp is not less than the accuracy of reconstructing the root state from any three leaves in an 
ultrametric tree. 



2 The Fitch method and its reconstruction accuracy 

Let C be a character with multiple states. Given a phylogenetic tree T of the character C in which each leaf 
has a state, the Fitch method estimates the root state from the leaf states in two steps. It first computes a 
subset Su of states for each node m of T as follows: 

1. If u is a leaf, Su contains only the state of u; 

2. If u is an internal node having children v and w, Su is equal to 5*^, U S^ if S^ and S^ are disjoint and 
Sv n Suj otherwise. 

After the subset Sr for the root of T is computed, the method selects a state as the root state from Sr 
randomly. In other words, a state is selected as the root state with probability ^-|-, where \Sr\ denotes the 
number of states contained in Sr- 

Assume the mutation process along each branch of the given tree is modeled as a stochastic process in 
which a state is replaced by another with some probability. The Fitch method reconstructs correctly a root 
state s from a set D of leave states only if s evolves into the leaf states in D. Hence, the accuracy of the 
Fitch method for reconstructing the state of the root of T, denoted by RAp{T), is defined to be the expected 
probability that the Fitch method outputs a true state from a set D of leave states. Let Prj.[i?|s] denote the 
probability that the root state s evolves into the leaf states in D. Then, 

RAF(r) = ^pris)PTr[D\s] Pr[s is output from D], (1) 

s,D 

where Pr(s) is the prior probability of s being the root state. 
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3 Recurrence formulas for analyzing the reconstruction accuracy 

In the rest of this paper, we assume that the character has only two states and 1 and the root takes these two 
states with equal prior probability. By definition, the Fitch method selects 1 with probability 1 if {1} is the 
state subset Sr{D) computed from D at the root in the first step. Otherwise, it selects 1 from Sr{D) ~ {0, 1} 
with probability i. Therefore, by symmetry, ([T]) becomes 

RA^(r) = J2^u[D\l]{Pr[SriD) = {!}] + i Pr[Sr{D) = {0, 1}]). (2) 

D 

Let 

Pyx[S\s]^Y.Ptx[D'\s] Py[Sx{D')^S] 

D' 

for a node X, a state s G {0, 1}, S = {1}, {0, 1}, and a set D' of possible states of the leaves below X. Ptx[S\s] 
is the probability that the Fitch method outputs state subset S* at X in its first step given the true state of 
X is s. By symmetry, 

Prx[{l}|l] =Prx[{0}|0], 

Prx[{0}|l] =Prx[{l}|0], 
Prx[{0, 1}|1] = 1 - Prx[{l}|l] - Prx[{0}|l], 
Pi-x[{0, 1}|0] = 1 - Prx[{l}|0] - Prx[{0}iO]. 



For a node X and a state s = 0, 1, we further set 

ax ^Ptx[{s}\s], (3x = Prx[{l ~ s}\s]. 



Then, 

Then, ([2]) becomes 



Prx[{0,l}\s] = l-ax - Px. 



RAp(T)=Pr,[{l}ll] + ipr,.[{0,l}|l] 

= i + ^(Pr^[{l}|l]-Pr.[{0}|l]) 

= ^ + ^K-/30- (3) 



Let Z be an internal node and have X and Y as its children. Furthermore, we let the conservation 
probability on branches ZX and ZY be px and py, respectively. The subset Sz computed at Z is {1} if and 
only if one of Sx and Sy is {1} and the other is {1} or {0, 1}. Hence, 

az = ipxax + qxl3x)iPYay + qyfiy) 
+ {pxax + qxfix){^ - ay - f3y) 

+ {l - ax - Px){pYay + qyl3y), (4) 
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where qx ^ 1 — px and qy = 1 — pv ■ Similarly 

Pz = (qxax +Px(3x){qYaY +PyI3y) 
+ {qxax - ay - (3y) 

+ {l-ax ~ f3x){qYaY +PyPy) (5) 

These two recurrence relations presented in f(j] lead to an efficient dynamic programming method for calculating 
ar and f3r- But, these two relations are not simple enough for the theoretical study of the reconstruction 
accuracy. In the rest of this section, we shall establish two recurrence relations for the purpose of the theoretical 
analysis. 
Let 

Cz ^ 1 - az - Pz 

and 

Dz = az - Pz- 

If Z is a leaf, we have that 

Dz^l. (6) 
Otherwise, we have the following recurrence relations. 

Theorem 3.1 Let Z be an internal node and have children X and Y . Then, 

Cz = i X [1 - Cx " Cy + SCxCy 

-{2px - l)(2py - 1)DxDy], (7) 

and 

Dz = i(2px - l)(l + Cy)Dx 

+ i(2py-l)(l + Cx)Dy. (8) 
Proof. These two relations can be verified by using ^ and ([5]). The details can be found in Appendix. □ 

As the first application of this theorem, we obtain the following fact. This result can be found in [S]. Here 
we give a short proof. 

Corollary 3.1 For any phylogenetic tree T with root r in which the conservation probability is at least 
Pr[0,l|s] = a < 5 for s = 0,1. 

Proof. We prove the fact by induction on n, the number of nodes of T. For n = 0, the fact follows from 
dSI). Suppose Cr < ^ for any tree with less than n nodes. Now, consider a phylogenetic tree T of n nodes. 
Let the root r of T have children X and Y. Then, by induction, < Cx,Cy < 5. Since px,PY > 1/2, by 
Formula 

Cr = (1/2) [2/3 + 3 (Cx - 1/3) [Cy - 1/3) 
-(2px - l)(2py - \)DxDy\ 

< (l/2)[2/3 + 3|Cx-l/3| X |Cy-l/3|] 

< (1/2) [2/3 + 3 X (1/3)'] 
= 1/2. 
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Hence, the fact holds. 



□ 



4 Accuracy on complete binary trees 

In this section, we study the reconstruction accuracy of the Fitch method on the complete binary trees. Let 
Tn be the complete binary tree of 2" leaves in which the conservation probability is p along each branch. Let 
r denote the root of T„ and Cn{p) — Cr and Dn{p) = Dr in T„. Since the subtree rooted at each child of the 
root in T„ is the complete binary tree of 2"~^ leaves, (O and ([8]) imply that, for n > 1, 

2C7„(p) = l - 2C„-i(p) + 3CLi(p) - (2p - lfDl_,{p), 

D„(p) = (2p - 1) (1 + C„-i(p))i3„„i(p), (9) 

where < p < 1 . 

Lemma 4.1 For any n > 1 and < p < I, 

Cn{p) = Cnil-p), |A.(P)| = |A.(1 



Proof. We prove by induction on n. For n — 0, the facts follow from Formula ©. 

Suppose now the lemma is true for n — 1; that is, C„_i(p) ~ C„-i(l — p) and |D„_i(p)| = |Z)„_i(l — p)\- 
Then 

2C„(p) 

= 1 - 2C„-i(p) +3C^_i(p) - (2p - lfDl_,{p) 
= l-2C„-i(l-p) + 3C2_i(l-p) 

- {2{l - p) ^ if Dl_,{l - p) 
= 2C„(l-p) 

and 

\D„{p)\ 

= |2p-l|-(l + C„-i(p))-|7:)„-i(p) 

= |2(1 -p) - 1| ■ (1 + -p)) ■ ~p)| 

= I^n(l-P)|, 

from which Lemma 14.11 follows by induction. □ 
By Lemma |4. 11 we have 

lim Cn{p) = lim C„(l-p) 

n — *oo n — ^oo 

and 

lim \Dn{p)\ = lim |A.(l-p)|, 

n — >oo n — ^oo 

if all the above limits exist. Therefore, it suffices to assume that 1/2 < p < 1. Now we simplify our notations 
by dropping p from two equalities in ([9]) , resulting in 

2C„ = 1 - 2C„_i + 3Cl_^ - (2p - l)^Dl_^, (10) 
A^ = (2p-l)(l + C„_i)A^-i. (11) 
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Lemma 4.2 For any n>l, 

0<C„<^, 0<D„<1. 

Proof. Since we assume 1/2 < p < 1, the first fact is from Corollarv l3.1l The second fact is trivial □ 

Lemma 4.3 Let n > 1. //C„-i < \, then C„ < ^. 
Proof. We rewrite Formula (fTO|) as 

2 - C„ V 3 f ^ - -(2p-lfOn-i=0. (12) 



This implies that 



and 



1 



0<2( f-C-n-i ) < (2p- 1)^73^2, 



4 ( ^ - C„_i ) < (2p - lfDi_, 



By Lemma [4.21 we have that 



2C„ = I +3(1 -C„-i) - (2p- 

< | + |(2p-l)*D*_2 

-(2p - 1)2 [(2p - 1)(1 + C„-2)D„-2] 

= l + (2p-i)*(lA',_2-(i + c„_2)2; 

< | + (2p-l)4(|-(l + 0)2)D^_2 



and hence Lemma l473l follows. □ 

Lemma 4.4 Let n>\. //C„_i > -j, then C„ < C„_i. 
Proof. 

2C7„ = 1 - 2C„-i + 3CLi - (2p - l)''£'Li 

= 2C„_1 + (1 - Cn-l)(l - 3Cn-l) - (2p - 1)''dLi 
<2Cn-l. 

□ 

Theorem 4.1 Suppose i < p < |. T/ien 

lim C„ = i lim = 0. 

n — >oo o — 

Proof. The proof is divided into two cases. 

Case I: C„ > 1/3 for all n. By Lemma [4.4[ C„ is a decreasing positive sequence and thus lim„_,oo C„ 
exists and its value is at least 1/3. The equality 2C„ = 1 — 2C„_i + 3C^_i — {2p — l)^D^_i implies that 
lim„^oo -D„ exists. Taking limits on all terms in (jlip implies that lim„^oo -Dn — since lim„^oo C'n > 1/3. 
Again, taking on all terms in (fTO| gives that 

2 lim C„ = 1 - 2 lim C„ + 3 f lim Cn) ^ - 0; 
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that is, lim„^oo Cn = 1/3 or 1. Since C„ is decreasing and Ci = 2p(l — p) < 1/2, lim„^oo ^ 1. Thus 
hm„^oo Cn = 1/3. 

Case 2: Cat < 1/3 for some N. By Lemma H31 Cn < 1/3 for aU n > N. Formula (fTTI) imphcs that 



4 ^ "-^ 



D„ = (2p - 1)(1 + C„-i)D„-i < i^-{2p - 1)J Dn- 

for any n> N. Since 1/2 < p < 7/8, |(2p — 1) < 1 and hence hm„^oo -D„ = 0. 
By Formula p^ . 

2 Q - a.) = (2p - - 3 ("Cn-i - 



and hence 



for all n > N. Since 



and 



by the Sandwich Theorem 



2 ( i - C„ ) < (2p - 



< 2 ( i - C„ 
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lim (2p- l)2z?2_i = (2p- 1)2 ( hm I?„_i)' - 0, 



lim 2 f i - C„ ) = 



, Cn = 1/3. □ 

To prove the convergence of C„ and I?„ for p > | , we set 

c„ = 2(l-p)/(2p-l)-C„ 

and 

dn = Dl 

Then, Formula implies that 



+ 3(i-^ + c„-i) -(2p-l)2d„_ 
+ 3 (a^T^ + cn-i) ' - (2p - l)2d„_i 



2 

3 

3 + 3(2p-l)--i + 2p-l "^"-1 + -^^n-l ~ l^P ~ -Lj "n-1, 



or equivalently 



o 2(8p - 7) 2 

2c„ = (2p - 1) d„_i — c„_i - 3c„_i 

2p — 1 

(8p-7)(4p-3) 
(2p - 1)2 • 



(13) 
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Formula pT|) implies that 

dn = (2p- 1)2 ^^-i-y - d„_i 

= [l-(2p-l)c„_i]'d„_i. (14) 

Lemma 4.5 For ony k > 2 and p > 7/8, 

(1) Cfe > 0. 

(2) dk+i < dk- 

(3) < 

Proof. We prove it by induction on k. The facts is obviously true for fc = 2, 3. Assume they hold for A; < rt — 1. 
We now prove they hold for k = n. 

(1). By induction, < c„_2,c„_i < j^f^- Hence, 

[l-(2p-l)c„_,]2-|^-|c„_r 

= - 2(2p - l)c„_, - ^c„_r + (2p - ircl_, 

^ 6(1 -p) 8p-l ^^ 5(1 -p) , ^ 



2p - 1 2 4(2p - 1) 

1-p ^ 53 - 40p 



2p- 1 

> 0. (15) 



Setting A = ' have 



_ (8p-7)(4p-3) 

I VVC liCXVC 

2,Cn 



= {2p ~ l)2d„_i - ^f^c„_i - 3cl_, - A 



= (2p-l)2d„_i-2c„_i(|3T 
By using recurrence ([13]) and p4|) . we obtain that 



Since c„_i > 0, Formula (fT3l) implies that 



2c7i 

= (2p - 1)2 (1 - (2p - 1)C„_2)' rf„-2 - + ICn-l] 

X [(2p - l)2d„_2 - ^t^c„_2 - 3c2_2 - A] - A 

= (2p- 1)2[(1 - (2p- 1)C„_2)' - 1^ - lcn-l]dn-2 

+ [|E^ + ic„-i][^c„_2 + 3c2_2 + A] - A. 



{2p - lfdn-2 > P cn-2 + Sc^,^ + A. 

2p — 1 
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This incquahty and imphes that 

2c„ 

> [(l-(2p-l)c„_2)'-|^-ic„-i] 

+ + ic„-i][2|^c„_2 + 34-2 + A] - A 
= «(«^-;Hi-^) c„_2 + [3 + (8p - 7)(4p - 7)]4_, 
+4(2p - l)(4p - 5)4_2 + 3(2p - l)2c4_2. 

By assumption, c„_2 < 4(2p^^i) ^'^'^ 4|3 — 5 < —1. Replacing c^_2 with ^^2p'-i) the right-hand side of 

the last inequality, we have that 

2Cn 

> «(«^-JH^-rt ,„_2 + [3 + (8p - 7)(4p - 7)14,2 



2p 

+5(1 - p)i4p - 5)4_2 + 3(2p - lfct_, 
= 5^5£gi^c„_2 + 3(l-p)(9-4p)cL2 

+3(2p-l)24_2 
> 

(2) We have proved that c„ > 0. Therefore, 

dn+i = [1 - (2p - l)c„]^d„ < d„. 

(3) Since dk decreases for k < n, 

dn<d2= Dl<{2p~lf (16) 

Let q = 1 — p. Note that P > | and 9 < |- Therefore, we have that 

1 4 
< - 



1 - 2g - 3 

and 



16(7(1 - 5(7) < 16 X 1 X h - 5 X 1 j = I 



Recalling that Cn-i > 0, by ([TB]), we have that 



= i[(2p-l)2d„_i^Hg^c„_i-34_i 

(8p-7)(4p-3) i 
(2p-l)^ \ 

< i[(2p-i)^d„_i- %pjtr' ] 

= 2(2^jV[(2p-l)'rf«-i-(8p-7)(4p-3)] 

< 2(2^[(2p-l)'-(8p-7)(4p-3)] 
= 2(2^jbF[(l-2(7)«-(l-8(7)(l-4(7)] 

= [29(7 - 40(7 + 60(7^ - 48(7^ + 16g4)] 

< [2g(7 - 40(7 + 60(7^ + 16g4)] 

< 72;;^[29(7-40(7+i2 + ^)] 



< 



< 



(2p-l)^ L^yv' ' 64 ' 256^ 

^5^[2g(8-40<7)] 

q 16q(l — 5q) 
2p-l r^^2q 

4g 1 
5(2p-l) l-2<j 
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Since g < i and ^ < |, c„ < < □ 

Theorem 4.2 Suppose | < p < 1. T/ien 

lim C„ = — — 

and 



n — >-oo 



(2p-l)4 

Proof. Since c„ > for all n, Formula implies 

(8p-7)(4p-3) 



(2p-l)4 



for all n. Since dn — Dn is a decreasing sequence, lim„^oo dn exists and is at least ''^^(2p~i)'-* ' which is larger 
than for p > |. Since < c„ < 1, 

0<l-(2p-l)c„<l. 

For p > I , Formula implies that 

lim 1 - (2p- l)c„ = 1 



and so 



Hence, lim„^oo C'„ - '^iLJEl 



lim c„ = 0. 

n — *oo 



2p-l • 

For p = I , Formulas and ([H]) become 



and 



2c„ + ic^ I — —dn-l 
16 
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As a decreasing sequence, (i„ has an non-negative limit. If lim„^oo dn = 0, by the Sandwich theorem, 
lim„^oo c„ = from the fact that < 2c„ < jgdn-i- Therefore, 

n— ►oo Zp — L 

lim iPf. = 

n^oo " (2p - 1)4 

If lim„^oo dn > 0, then, 

3 

dn = d„_i(l - ^c„_i)^ 

implies that lim„^oo c„ = and hence lim„^oo dn = 0, a contradiction. □ 
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Theorem 4.3 Let r„ he the complete binary tree of 2" leaves in which the conservation rate is p along each 
branch. In the two-state Jukes- Cantor model, 

(a) (Steel J^) the accuracy of the Fitch method for reconstructing the root state m T„ converges as n goes 
to infinity to 5 + 2(2p-i)2 V i^P - '^)i^P " 3) if P & [|, 1] and i if p € [i, |]. 

(b) it diverges as n goes to infinity if p G (0, |); 

Proof. By Formula ^ and the definition of Dn, 

RAf{T,-,) = i + ^A.. 

Hence, the fact (a) follows from Theorems 14.11 and 14.21 

When Q < p < |, D„ > for even integers n and Z?„ < for odd integers n. By Lemma 14.11 and 
Theorem 14.21 |Z3„| converges to a positive number. Hence Z3„ and RAp{Tn) diverge. □ 



5 The reconstruction accuracy on ultrametric trees 

We now consider the accuracy of reconstructing the root state in ultrametric phylogenies. In an ultrametric 
phylogeny T, a branch xy has a length txy^ but all the leaves have the same distance from the root. Under 
the two-state Jukes-Cantor model, the conservation probability p^y along a branch xy of length t^y is 

Pxy = i(l + e-'^*-), 

where A is a constant, representing the substitution rate in T . For an internal node u of T, the distance 
between it and any of its leaf descendants is defined as its depth, denoted by d{u) . 

Lemma 5.1 Let T be an ultrametric phylogeny and u an internal node. Under the 2-state Jukes-Cantor 
model, for any path P(x,y) from an internal node x to its leaf descendant y, 

n (2p„„-l) = e-2^''(^). (17) 

uv^P(x.y) 

Proof. It follows from that 2p,j.„ — 1 = e^^"^*"" for each edge uv and that d{x) — J2uveP(x y) 

Let T be an ultrametric tree that has three or more leaves. For any internal node w with children wi and 
W2^ by Formula ([5]) and Lemma |5.1[ we have that 

Dy, > \{2p^n„ - l)D^, + i(2p^„, - 1)1?^, (18) 

because C^nCw^ > 0. By induction, we can show the following fact from Formula p^ . 

Lemma 5.2 

D^> n i2puv - I) ^ e'^^^'^^K 

{u,v)£P{w,l) 

where I is a leaf below w. 
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By Formula ([3]), the above lemma implies that the accuracy of reconstructing the root state from all the 
leaf states is not less than from a single leaf. Such a fact was established by Fischer and Thatte in [2 . It can 
be strengthen as follows. 

Theorem 5.1 Let T be an ultrametric tree having three or more leaves and let x he a child of its root r. If x 
has two children, then 

Dr > e-^M'-^il + i(l - e-^^'^^^))] (19) 
Proof. By Lemma [Ol Dy > e-2A<i(y) gj^^g (j^ > Formula (O, we have that 

Dr = i(2p„ - 1)(1 + Cy)D^ + i(2p,,, - 1)(1 + C^)Dy 

>\{2vr. - 1)D. + ie-^^'*('-)(l + C.) (20) 

Let u and v be the children of x. By Lemma [Ol £>„ > e^^^''^") and > e-2A<i(D) 
Du = e-'^'*("Hl + A(u)), = e-2M-)(i + a(w)), 
where A(u), A(u) > 0. We then have 

= ^{'2p.u - 1)(1 + C^)Du + i(2p,„ - 1)(1 + C„)D„ 
= e-'^-^f^'jifl + C„ + A{u) + C.A{u)] 

+ ^[1 + Cu + A{v) + CuA{v)]} 
> e-2^'^(-){l + i [C7„ + C„ + A{u) + A{v)]}. (21) 
Combining Formulas ([^0]) and (PT|) gives that 



By Formula ([7]), 



We further have that 



Since d{x) > d{v), 



Dr > e-'^'*(''>{l + + j[Cu + C„ + A{u) + A{v)]} 



> i[l - Cu - C„ ~ (2p,„ - l)(2p,, - 1)D^D,], 
= i{l - C„ - a - e-^^'^f^^fl + A(u)][l + A{v)]}. 



^^>lg-2AdM 

~ 4 

x{5 + A{u) + A{v) - e-*^'*(")[l + A{u)][l + A{v)]}. 
[1 + A(t;)]e"*^'*("' < [1 + A(v)]e-^'""-''^ =D^ <1. 
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Therefore, we obtain that 

A{u) + A{v) - e-*^''("'[l + A{u)][l + A{v)] 
> —e 

and 

□ 

It is known that there exists an uhrametric tree in which the root state can be reconstructed more accurately 
from the states of a subset of four leaves than from all the leaf states. Let hjhjh be three leaves in T. 
Assume that the least common ancestor (lea) t of I2 and I3 is not the root r and has depth d{t). If the 
lea of li and t is the root, then, the accuracy of reconstructing the root state from these three leaves is 

1 ^ l^~2\dir)^^ _ g-4A<i(t))]^ jg j^Qg^ 1 _^ lg-2Ad(r-)[^ _^ _ g-4A<i(x))] bgcauSe d{x) > d{t) . If 

the lea of li and t is not r, the accuracy is even smaller. Therefore, Theorem 15 . 1 1 implies that the reconstruction 
of the root state from all the leaf states is at least as accurately as from the states of any three leaves. 
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Appendix: Proof of Theorem 13.11 

Wc first have that 

{pxax + qxl3x){^ - ay - Py) - {qxax +PxPx){'^ - ay ~ Py) = (2px - 1)CyDx, 
{1-ax ~ f3x){pYaY + QyPy) - (1 - ax - /3x)(gyay +pyPy) = {^Py - ^)CxDy, 

and 

(pxax + qxPx)[PYaY + 9y/3y) - {qx^x +PxPx){qYaY +PF/3y) 
= ipx +PY - l){axaY - PxPy) + (6 - a)(/3xay - axPY)- 

Since 

axaY - PxPy = {ax - + /3x(ay - Py) 

and 

f^xay - axPY = ax (ay - Py) - (ax - /3y)ay, 
combining the equalities ([22|) - p4l) given above leads to 

Dz = {2px - 1)(1 - MDx + {2pY - 1)(1 - ax)DY. 

By symmetry, 

Dz = (2px - 1)(1 - ay)^x + (2py - 1)(1 - /3x)-Dy. 

Therefore, 

Dz = ^(2px-l)(2-ay-/3y)I?x + ^(2py-l)(2-ax-/3x)I)y 
= i(2px - 1)(1 + Cy)Dx + i(2py - 1)(1 + Cx)Dy 
Moreover, we also have that 

az + Pz 

= {pxPY + qxqY){axaY + PxPy) + {qxPY +PxqY)il3xaY + ax/3y) 
+ (ax + /3x)(l - ay - /Jy) + (1 - ax - ay)(ay + /3y). 

Since 

axay + /3x/3y = ^((1 - Cx)(l - Cy) + DxDy) 

and 

/3xay + ax/3y = ^((1 - Cx)(l - Cy) - i^xA-), 

we obtain that 

l-Cz ^^[I + C'x + Cy - 3CxCy + {2px - l)(2j5y - 1)DxDy] , 

or equivalently 

= i [1 - Cx - Cy + 3CxCy - (2px - l)(2py - 1)DxDy] . 
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